Skip to content

Conversation

BrandonArp
Copy link
Member

Inject and extract dimensions in the MappingSource

@BrandonArp BrandonArp force-pushed the mapping_the_dimensions branch from 68f894c to 0f17b68 Compare March 2, 2018 17:55
final String replacedString =
RegexAndMapReplacer.replaceAll(matcher, metric.getKey(), replacement, record.getDimensions());

final int tagsStart = replacedString.indexOf(';');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting tags should be configurable based on the replacement. For backwards compatibility default should probably be not to extract; but I imagine you would prefer the other.

for (final Map.Entry<String, ? extends Metric> metric : record.getMetrics().entrySet()) {
boolean found = false;
for (final Map.Entry<Pattern, List<String>> findAndReplace : _findAndReplace.entrySet()) {
final Matcher matcher = findAndReplace.getKey().matcher(metric.getKey());
if (matcher.find()) {
for (final String replacement : findAndReplace.getValue()) {
merge(metric.getValue(), matcher.replaceAll(replacement), mergedMetrics);
final String replacedString =
RegexAndMapReplacer.replaceAll(matcher, metric.getKey(), replacement, record.getDimensions());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tag is used in the metric we should be able to configure to remove it from the dimension set. Default should probably be to remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about this. It's gonna be a bit tricky.

* @param variables map of variables to include
* @return a string with replacement tokens replaced
*/
public static String replaceAll(final Matcher matcher, final String input, final String replace, final Map<String, String> variables) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should variables be an ImmutableMap? -- this also looks long enough to be chopped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not? Fixed.

*/
public static String replaceAll(final Matcher matcher, final String input, final String replace, final Map<String, String> variables) {

matcher.reset();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not nuts about the state sharing between this method and the caller.

Copy link
Member Author

@BrandonArp BrandonArp Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

lastMatchedIndex = matcher.end();
appendReplacement(matcher, replace, builder, variables);
found = matcher.find();
} while (found);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matcher is for the metric pattern; what matches are you iterating over here? I am a little concerned you've combined two concepts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed. It kinda is, but it's also a fairly succinct way of specifying the intent.

*
* @author Brandon Arp (brandon dot arp at inscopemetrics dot com)
*/
public class RegexAndMapReplacerTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • final
  • What's the coverage this gives us?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Coverage is 100%

while (x < replacement.length() - 1) {
x++;
final char c = replacement.charAt(x);
if (c == '\\') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is \\ used to escape $ or should we just use $$? Either way, this needs to be documented in the class header.

Copy link
Member Author

@BrandonArp BrandonArp Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented. Also note that \\ is just a java-ism for the literal '\'

String.format("Improper escaping in replacement, must not have trailing '\\' at col %d: %s", x, replacement));
}
final Character c = replacement.charAt(x);
if (c == '\\' || c == '$' || c == '}') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe with most open-close style formats you need to escape both. I understand it's strictly not necessary here, but I'd like to be consistent with say regex and bash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. Fixed.

}
}

private static int writeReplacementToken(final String replacement, final int offset, final StringBuilder output,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Assert.assertEquals(expected, stockResult);
} catch (final IllegalArgumentException ignored) { }
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take this opportunity to document the mapping source in the README.md.

@BrandonArp BrandonArp force-pushed the mapping_the_dimensions branch 2 times, most recently from 953262e to 31c3ead Compare April 2, 2018 05:20
@@ -40,6 +40,27 @@ safe_command() {
fi
}

checksum() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.


/* package private */ TransformingObserver(
final TransformingSource source,
final Map<Pattern, List<String>> findAndReplace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be ImmutableMap; no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and ImmutableList)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

super(builder);
_source = builder._source;

final ImmutableMap.Builder<Pattern, List<String>> findReplaceBuilder =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List => ImmutableList

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


// Merge the metrics in the record together
final Record record = (Record) event;
final Map<ImmutableMap<String, String>, Map<String, MergingMetric>> mergedMetrics = Maps.newHashMap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should eventually push the Key model class to this level? If so, can you add a TODO here? (if not, let me know what your thoughts are on it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I migrated to using Key instead. Not too hard.

boolean found = false;
final String metricName = metric.getKey();
for (final Map.Entry<Pattern, List<String>> findAndReplace : _findAndReplace.entrySet()) {
final Pattern metricPattern = findAndReplace.getKey();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this findAndReplacePattern?

The one thing I am not seeing here is how to select which metrics to apply which rules to. The difference between selecting a metrics and transforming it. While I realize that for metric name this can often be expressed with one regex, I suspect there is actually more to this pattern.

Specifically, I believe we'll need to provide "selection" of metrics based not only on name but also on dimension set. For example: contain key, containing key regex or containing key with value or regex of value; I don't see a point in just containing value or regex of value ... but that's probably my overly typed view of the world.

Implementing the full selection is not something you need to do now. However, I would like you to consider it in how the configuration is laid out. I would like to be able to add this without making non-backwards compatible changes to the configuration. To this end I believe we need to support a list of transformations. By default the transformation can match all, and then adding selectors can narrow those down. There is one question which is whether transformations are all applied or only some.

We can default either way, supporting the other can just be a flag away. Although I am concerned about how slippery the slope may be to a graph/tree representation (instead of a list). I would like to hear your thoughts on this one.

return ImmutableMap.copyOf(finalTags);
}

private void merge(final Metric metric, final String key,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap one, wrap all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* @param value true to replace existing dimension value
* @return This instance of {@link Builder}.
*/
public Builder setReplaceExisting(final Boolean value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: overwrite?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO to refactor this into commons once there is a second use case for it.

*/
public final class RegexAndMapReplacer {
/**
* Replaces all instances of $n (where n is 0-9) with regex match groups, ${var} with regex capture group or variable (from map) 'var'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to maintain compatibility with the mapping sink. I think that there are a few other cases here which may be handy:

${env:<name>}
${prop:<name>}
${capture:<name|number>}
${dimension:<name>}

While the simplicity of precedence is appealing, I cannot come up with a consistent scheme that doesn't break down. The namespacing while somewhat more arduous on users is explicit and easy to debug in case of failure (versus precedence which can result in unexpected results).

If you feel strongly another way I'd like to hear your thoughts on this.

if (inReplaceBrackets) {
// Consume until we hit the }
while (x < replacement.length() - 1 && c != '}') {
if (c == '\\') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like regex capture groups, there should be no need to support escaped characters inside replacements unless you want to support dynamic replacements;

e.g.
${foo_${bar}}

This would certainly give insane flexibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing escaping does not allow for dynamic replacements like that. It does allow for "special" characters in the names of the dimension, however. For example, if someone decided to have "{a}" as a dimension name, for whatever reason. Moreover, it makes for more consistent rules about how to escape (especially important if you have multiple levels of escaping.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the prefix support (env, capture, dimension are supported now). Precedence is explicit in the ordering of the prefixed variable maps. Phew, that was a bit of work.

README.md Outdated
remove = [
baz
]
findAndReplace = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is just on metric name?

Three points:

  • It's unclear that the context of inject and remove is dimensions versus findAndReplace which is metrics.
  • I can see that we'll want to apply findAndReplace to dimensions not just metrics in a future change.
  • The find and replace is quite a bit more powerful than just a find and replace because it allows you to extract additional dimensions.

Suggestions:

  • rename inject to injectDimensions
  • rename remove to removeDimensions
  • rename findAndReplace to transformMetric
  • rename overwriteExisting to overwrite
  • document the default value for overwrite

I'll look for this later, but I assume the semi-colon can be escaped if there is a literal one required.

In general, I am concerned that we're using an "expression" like language within in string instead of just specifying the control structure.

e.g.

transformMetric = [
    [
        match = "extract/([^/]*)/thing"
        replace = "extract/thing/${other_dimension}"
        injectDimensions = [
            my_dimension {
                value = "${1}"
                overwrite = true
             }
        ]
    ]
]

This is much more verbose, but there is less parsing, less code, and arguably less ambiguity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change the names. I don't think the semicolon can be escaped. I doubt that's something we'll really see, but I can add that functionality if you really feel strongly about it. Also, should I then be allowing escaping the '='?

I agree that your configuration example is more powerful (in most cases), but it adds significantly more processing (even after parsing is considered). If you want to add multiple dimensions, you need to run each dimension value through that same replacement, with the same context from the match. Also, you'll lose the ability to do something like "some/([^/]/([^/]/val;${1}=${2}" where you can have dynamic dimension keys. Moving the inject and remove dimensions into the metric transformation will likely be a significant code change for, IMHO, little value and significantly increased cost of use (configuration, processing time).

README.md Outdated
tranformations is a list of TransformationSet objects. Each TransformationSet has an inject (Map\<String, DimensionInjection\>),
remove (List\<String\>) and a findAndReplace (Map\<String, List\<String\>\>).

A DimensionInjection is just a value and a boolean of whether or not to overwrite existing values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwrite should have a default, probably true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Documented.


This namespacing prevents the built-in precendence search for a matching variable.

__Note: Any dimensions matched as part of a replacement will be removed from the resulting metric.__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems somewhat surprising. Shouldn't it be kept, and then you can add it to remove if you want to drop it? I understand that's common behavior, but it's going to be harder to express using it and restoring it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of dimensions is not on a per-metric basis. The removal applies to all metrics. As for preserving it, it's as simple as "metric/${consumed};consumed=${consumed}". Basically, add it as a dimension with the same value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that's a weird scoping; why would the removal apply to all metrics but when the extraction only applies to some?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear and somewhat surprising since the rule only applies to a particular set of metrics based on the regex match.

Dimensions can also be injected into a metric. To do this, add a ';' after the replacement name of the metric
and proceed to specify key=value pairs (where both the key and value can use variables). Basically, the
replacement is processed, the string split on ';' with the first part being the metric name and anything
after the ';' being parsed as key=value pairs to be added to the metric's dimensions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they replace? What if I don't want them to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you didn't want them to replace, why would you put them there?

* @param variables map of variables to include
* @return a string with replacement tokens replaced
*/
public static Replacement replaceAll(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation looks off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.... wtf? Fixed.

final String variableName) {
if (prefix.equals("capture")) {
if (isNumeric(variableName)) {
final Integer groupIndex = Integer.valueOf(variableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about about numeric being something other than a capture group index and also the function calling this already special cased this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ This can also throw or be invalid if it overflows.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
}

// Only record variables that are not captures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only need to record variables that are dimensions. However, see my comments on the README about the default to remove.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RegexAndMapReplacer doesn't know about the concept of dimensions. I was hoping to keep this class separate from the implementation details of the source.

// Only record variables that are not captures
variablesUsedBuilder.add(String.format("%s:%s", prefix, variableName));

final Map<String, String> variableMap = variables.get(prefix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the variableMap contain the environment variables too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you've added them to the variables Map<Map<String, String>>. But there is no special delineation here.

final String variableName) {
// First try the capture group with the name
try {
return matcher.group(variableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are intent on keeping the numeric representation, I think that you need to push it down to this level to ensure it's used in the correct precedence order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I didn't consider the case where an environment var, for example, was a number.

* Describes the replacement string and variables used in it's creation.
*
* The "replacement" field is the resulting string.
* The "variablesMatched" field is a list of input variables that were matched, in prefix:variable form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not nuts about maintaining the internal data structures with the prefixes. The reason is that it means parsing and escaping the colons in more than one place. Have you considered just using the prefix to refer to a different input map? (e.g. map of maps where first key is prefix -- for example)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'll return that same map structure as gets passed in.

@BrandonArp BrandonArp force-pushed the mapping_the_dimensions branch from 0fe05d9 to 35b351d Compare April 13, 2018 17:53
tranformations is a list of TransformationSet objects. Each TransformationSet has an inject (Map\<String, DimensionInjection\>),
remove (List\<String\>) and a findAndReplace (Map\<String, List\<String\>\>).

A DimensionInjection is just a value and a boolean of whether or not to overwrite existing values. Of omitted, overwrite defaults to true.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type-o: Of => If

}
```

tranformations is a list of TransformationSet objects. Each TransformationSet has an inject (Map\<String, DimensionInjection\>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t => T

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of escaping, why not put it in ticks for inline code?


This namespacing prevents the built-in precendence search for a matching variable.

__Note: Any dimensions matched as part of a replacement will be removed from the resulting metric.__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unclear and somewhat surprising since the rule only applies to a particular set of metrics based on the regex match.

variablesMap.put("dimension", record.getDimensions());
variablesMap.put("env", System.getenv());

for (TransformationSet transformation : _transformations) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that the transformations should be treated as a single context. As a single context there can be only a single set of dimensions. In the case of say CollectD this is not necessarily the case. Consider cpu extracting core into a dimension; that doesn't apply to memory (were they in the same UOW). I think each transformation needs to create its own UOW from the existing one with the matching metrics and the dimensions that are elected. If we want to join them back (assuming dimension sets match) that's a possible optimization. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate:

  • There is the incoming UOW
  • Metrics matching a transformation are transformed into a new UOW which inherits its dimensions from the parent (post inject and remove) along with any injections/removals from the transform
  • Any metrics included in a new UOW are removed from the parent UOW
  • Metrics matching multiple UOWs are duplicated across them (subsequent changes adding separate matching rules to transformers can make this more elaborate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's essentially what we're doing with the merge map.

// Remove the dimensions that we consumed in the replacement
remove.forEach(finalTags::remove);
transformation.getRemoveDimensions().forEach(finalTags::remove);
transformation.getInjectDimensions().forEach(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should document that inject trumps remove and both of these trump anything extracted in the transform and subsequently removed. However, I'm a little concerned about the way this is done. Specifically, the way these can compound. Depending on your feedback on the independent UOWs from each transformation this may be a discussion we should have.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how this is done. 1) remove the consumed dimensions, 2) remove the explicitly removed dimensions, 3) inject the explicitly injected dimensions (honoring overwrite), 4) inject the extracted dimensions. Not sure what you think the order should be.

/**
* Represents a dimension to inject and whether or not it should overwrite the existing value (if any).
*/
public static final class DimensionInjection {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be marked @Loggable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

*
* @author Brandon Arp (brandon dot arp at inscopemetrics dot com)
*/
public static final class TransformationSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be marked @Loggable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final StringBuilder tokenBuilder,
final Map<String, ImmutableList.Builder<String>> variablesUsed) {
boolean inReplaceBrackets = false;
boolean tokenNumeric = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

replaceToken));
}
if (prefixSeparatorIndex != -1) {
prefix = replaceToken.substring(0, prefixSeparatorIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I have ${:foo}; won't that be processed as non-prefixed? Where is prefix validation done?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(empty prefix is not the same as no prefix)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way these are processed, empty prefix is the same as no prefix.

@BrandonArp BrandonArp force-pushed the mapping_the_dimensions branch from 327029f to 92454bf Compare April 19, 2018 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants